theoretical view
A Theoretical View on Sparsely Activated Networks
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising research direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, there is no theoretical grounding for these sparsely activated models. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures.
A Theoretical View on Sparsely Activated Networks
Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising research direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, there is no theoretical grounding for these sparsely activated models. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures.
Each convolution kernel is a classifier!
It can be said the same for Deep Convolutional Neural Network (CNN) and Computer Vision: Convolutional Neural Network is the new electricity for Computer Vision. CNNs are truly a diamond for the Machine Learning community, especially in the rapidly growing field of Computer Vision. They are the backbones of a myriad of fundamental tasks in computer vision, from as simple as image classification (but was considered very challenging or intractable just more than 2 decades ago) to more complicated tasks, including image segmentation, image super-resolution, and image captioning. This is why there has been a growing interest in CNN in both academics and industry in the past decade. Although many engineers and students are practicing and using CNNs on a day-by-day basis, most people lack a comprehensive theoretical view of Convolution and Pooling blocks, the two most basic building blocks of any CNN.